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This  article  presents  an  algorithm  for  constructing  orthogonal  Latin  hypercubes,  given  a  fixed  sample  size, 
in  more  dimensions  than  previous  approaches.  In  addition,  we  detail  a  method  that  dramatically  improves 
the  space-filling  properties  of  the  resultant  Latin  hypercubes  at  the  expense  of  inducing  small  correlations 
between  the  columns  in  the  design  matrix.  Although  the  designs  are  applicable  to  many  situations,  they 
were  developed  to  provide  E>epartment  of  Defense  analysts  flexibility  in  fitting  models  when  exploring 
high-dimensional  computer  simulations  where  there  is  considerable  a  priori  uncertainty  about  the  forms 
of  the  response  surfaces. 
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1.  INTRODUCTION 

Physical  experimentation  can  be  resource  intensive  in  terms 
of  material,  time,  and  money.  Therefore,  the  United  States  De¬ 
partment  of  Defense  (DoD)  often  relies  on  simulation  models  in 
its  decision  making.  Among  other  things,  simulation  models  are 
used  to  help  test  war  plans,  decide  what  equipment  to  acquire, 
determine  weapon  mixes,  and  study  doctrine  and  potential  op¬ 
erational  concepts  (see,  e.g.,  Appleget  1995;  Loerch,  Pudwill, 
and  LaBarbera  1996;  and  the  Defense  Modeling  and  Simula¬ 
tion  Office  website:  www.dmso.mil).  Because  there  is  a  dearth 
of  data  with  which  to  assess  the  veracity  of  many  of  these  sim¬ 
ulations,  especially  at  the  force-on-force  level  or  when  inves¬ 
tigating  future  scenarios,  they  are  often  used  in  an  exploratory 
matmer.  That  is,  the  simulations  are  used  to  assist  senior  leader¬ 
ship  in  understanding  and  reasoning  about  extremely  complex 
systems  and  processes  (Bankes  1993). 

The  simulations  that  DoD  analysts  use  are  often  quite  large 
and  almost  unimaginably  complex.  Many  contain  thousands  of 
input  variables,  a  substantial  number  of  which  may  be  signifi¬ 
cant.  Moreover,  many  of  the  input  variables  (e.g.,  with  whom, 
where,  and  how  a  future  conflict  may  take  place)  are  uncer¬ 
tain.  Furthermore,  as  Saeger  and  Hinch  (2(X)1)  showed,  the  re¬ 
sponse  surfaces  may  be  highly  nonlinear.  In  addition,  due  to  the 
stochastic  nature  of  combat,  most  of  the  simulations  include 
pseudo-random  events.  Unfortunately,  the  complexity  and  un¬ 
certainty  associated  with  these  simulations  makes  using  strong 
prior  knowledge,  such  as  the  distributional  form  of  the  error 
term,  unreliable.  To  efficiently  explore  these  simulations,  we 
want  readily  available  experimental  designs  that  allow  us  to 
screen  a  large  number  of  input  variables,  fit  commonly  used 
main-effects  models  with  nearly  uncorrelated  coefficient  esti¬ 
mates,  while  providing  flexibility  to  fit  complex  models  (includ¬ 
ing  nonparametric)  on  a  modest  number  of  dominant  factors. 

To  address  this  goal,  this  article  extends  Ye’s  (1998a)  algo¬ 
rithm  for  constructing  orthogonal  Latin  hypercubes  (OLHs)  to 
allow  more  factors  to  be  included  within  a  fixed  sample  size. 
Unfortunately,  the  space-filling  properties  of  these  designs  can 
be  poor.  A  good  space-filling  design  is  one  in  which  the  de¬ 
sign  points  are  scattered  throughout  the  experimental  region 


with  minimal  unsampled  regions.  To  rectify  this  potential  short¬ 
coming,  we  present  a  computer-intensive  algorithm  to  generate 
Latin  hypercubes  (LHs)  that  have  good  space-filling  properties 
and  are  nearly  orthogonal;  for  example,  all  correlations  between 
the  columns  in  the  design  matrix  are  in  the  interval  (-.03,  .03). 
It  takes  extensive  time  to  generate  these  designs  using  our  algo¬ 
rithm;  therefore,  a  catalogue  of  ready-to-use,  nearly  orthogonal 
and  good  space-filling  designs  for  up  to  22  factors  in  as  few 
as  129  runs  has  been  given  by  Cioppa  (2002)  and  available  for 
download  at  http://harvest.nps.edu.  The  methodology  used  to 
construct  these  designs  can  be  applied  to  generate  designs  for 
more  than  22  factors. 

The  article  is  organized  as  follows.  Section  2  provides  the 
background  and  frames  the  issues  that  our  designs  address.  Sec¬ 
tion  3  details  our  extension  of  Ye’s  algorithm  to  allow  more 
variables  in  a  fixed-sample  size  OLH.  But  because  these  designs 
may  have  poor  space-filling  properties,  we  present  an  algorithm 
in  Section  4  that  provides  dramatic  improvement  in  the  space¬ 
filling  properties  by  slightly  relaxing  the  orthogonality  require¬ 
ment.  Section  5  discusses  approaches  that  we  have  found  use¬ 
ful  in  extending  these  designs  to  situations  requiring  different 
combinations  of  factors  and  sample  sizes.  Section  6  illustrates 
the  use  of  the  22-variable  design  in  an  agent-based  simulation 
study  of  a  military  peace  enforcement  scenario.  Section  7  gives 
conclusions  and  suggests  directions  for  future  research. 

2.  BACKGROUND 

In  this  section  we  define  our  notation,  discuss  building  meta¬ 
models  to  better  understand  the  relationship  between  simulation 
inputs  and  outputs,  briefly  review  some  designs  frequently  used 
in  simulation  experiments,  and  examine  the  measures  that  we 
use  to  assess  the  quality  of  our  designs. 
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2.1  Meta-Models 

Consider  the  situation  in  which  a  simulation  model  contains  k 
continuous  input  variables  that  we  wish  to  explore  with  n  com¬ 
putational  experiments  over  a  rectangular  region.  Suppose  that 
the  model  generates  a  vector  of  output  responses  denoted  as  y. 
Let  the  ith  input  variable  be  denoted  as  x,,  with  X  represent¬ 
ing  the  n  X  ^  input  design  matrix  and  yj  representing  an  output 
response  from  the  simulation.  To  help  us  understand  our  simu¬ 
lation  models,  we  often  build  meta-models  to  quantify  the  re¬ 
lationship  between  the  input  variables  {x\,X2, . . .  ,Xk)  and  the 
output  measures.  A  meta-model  is  a  relatively  simple  function, 
g,  compared  with  the  original  simulation,  which  is  constructed 
given  an  experimental  design  and  the  corresponding  responses. 
A  good  meta-model  is  one  in  which  g  makes  parsimonious  use 
of  the  variables  available  and  the  errors  (i.e.,  the  differences  be¬ 
tween  the  meta-model  and  the  simulation  output)  are  small. 

One  of  the  simplest  and  most  commonly  used  meta- models 
is  one  in  which  g  is  a  linear  combination  of  the  inputs,  that  is, 

k 

g(x)  =  ^o  +  X^|0,x,-.  (1) 

/=l 

When  estimating  the  coefficients  in  ( 1 ),  the  precision  of  the 
estimates  can  be  adversely  affected  by  collinearity  among  the 
input  variables  (Ryan  1997).  If  the  columns  in  the  design  ma¬ 
trix  between  input  variables  x,  and  Xj  are  uncorrelated,  then  the 
regression  estimates  of  and  Pj  in  ( 1 )  are  uncorrelated. 

For  many  simulations,  a  linear  meta-model  may  not  suffi¬ 
ciently  characterize  the  response  surface.  Unfortunately,  it  takes 
many  more  observations  to  estimate  meta-models  with  curvilin¬ 
ear  and  interaction  terms.  For  example,  suppose  that  g  includes 
quadratic  and  bilinear  interaction  effects,  as  well  as  the  linear 
terms,  that  is, 

k  2k  k-\  k 

g(x)  =  /3o-t-^Ax,-l-  X!  + 

1=1  I=<:+1  /=1  j>i 

To  have  sufficient  degrees  of  freedom  to  estimate  the  coef¬ 
ficients  in  (2),  the  number  n  of  simulation  runs  must  satisfy 
n>k-|-k-|-(2)-l-l.  Therefore,  n  must  grow  on  the  order  of 
k^.  More  complicated  meta-models  require  even  larger  n.  How¬ 
ever,  in  practice,  typically  only  a  small  percentage  of  the  input 
variables  turn  out  to  be  significant.  For  example,  in  an  empirical 
simulation  screening  experiment,  using  a  deterministic  ecolog¬ 
ical  model,  Bettonvil  and  Kleijnen  (1997)  found  that  only  15 
of  281  factors  considered  had  effect  sizes  in  their  linear  with 
two-factor  interactions  meta-model  above  a  certain  threshold. 
Furthermore,  when  constructing  meta-models,  it  is  often  rea¬ 
sonable  to  focus  the  higher-order  terms  by  assuming  that  sig¬ 
nificant  interactions  and  nonlinear  terms  likely  include  factors 
that  tested  significant  when  fitting  a  linear  response. 

2.2  Designs  Commonly  Used  to  Explore  Simulations 

Various  designs  are  available  for  practitioners  to  use  in  sim¬ 
ulation  studies  (Kleijnen,  Sanchez,  Lucas,  and  Cioppa  2005). 
The  appropriateness  of  the  various  designs  depends  on  many 
factors,  including  the  number  of  variables  that  one  wishes  to 
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explore,  the  number  of  computational  experiments  that  are  fea¬ 
sible,  the  types  of  meta-models  that  one  wishes  to  be  able  to 
analyze,  whether  iterations  between  the  experiments  and  the 
analysis  are  possible,  a  priori  assumptions  on  the  response,  and 
statistical  criterion  over  which  the  analysts  want  to  optimize. 
We  are  especially  interested  in  cases,  such  as  those  often  found 
in  defense  analysis,  in  which  there  may  be  multiple  responses 
of  interest  and  little  a  priori  knowledge  about  the  forms  that 
the  response  function  may  take.  Thus,  we  adopt  the  principle  of 
Santner,  Williams,  and  Notz  (2003)  for  selecting  designs  that 
“allow  one  to  fit  a  variety  of  models  and  provide  information 
about  all  portions  of  the  experimental  region.”  Specifically,  we 
desire  to  simultaneously  be  able  to  efficiently  fit  linear-effects 
models  over  many  variables  (often  for  screening  purposes)  and 
quite  complex  models  on  a  handful  of  dominant  factors — all 
within  a  constrained  number  of  runs. 

Perhaps  the  most  used  exploratory  designs  are  factorial  and 
fractional-factorial  designs.  Unfortunately,  the  number  of  runs 
necessary  increases  dramatically  as  the  number  of  factors  or 
levels  increases.  Highly  fractionated  (e.g.,  Plackett  and  Bur- 
man  1946)  and  supersaturated  (Lin  1993)  designs  can  mitigate 
the  “curse  of  dimensionality”;  however,  they  have  extremely 
poor  space-filling  properties  and  severely  limit  the  meta-models 
that  an  analyst  can  examine.  Thus  they  are  not  sufficient  in 
and  of  themselves  for  situations  in  which  the  response  may  be 
complex — for  example,  with  substantial  nonlinearities,  higher- 
order  interactions,  and  changepoints. 

Designs  commonly  used  in  response  surface  methodology 
(see  Meyers  and  Montgomery  2002),  such  as  central  composite 
designs  (CCDs),  are  excellent  designs  for  identifying  quadratic 
and  interaction  terms.  Furthermore,  by  using  highly  fraction¬ 
ated  two-level  designs  in  constructing  a  CCD,  the  number  of 
runs  required  grows  modestly  (on  the  order  of  k^)  with  the  num¬ 
ber  variables.  However,  these  designs  do  not  have  good  space¬ 
filling  properties  when  they  are  projected  into  the  subspaces  de¬ 
termined  by  a  small  number  of  input  variables.  For  example, 
with  only  five  levels  per  factor,  CCDs  lack  the  granularity  pro¬ 
vided  by  designs  such  as  LHs,  which  limits  their  effectiveness 
with  such  exploratory  techniques  as  regression  trees. 

Group  screening  (Dorfman  1943)  and  sequential  bifurcation 
(Bettonvil  1995)  designs  are  able  to  screen  a  numerous  factors 
(e.g.,  >200)  in  a  relatively  few  number  of  runs.  However,  these 
approaches  require  factor  sparsity  and  assume  a  priori  known 
directional  effects  of  the  factors.  Chaloner  and  Verdinalli  ( 1 994) 
provided  an  excellent  expository  on  Bayesian  experimental  de¬ 
sign  and  noted  that  Bayesian  statistics  can  be  effective  in  de¬ 
termining  a  design  and  input  parameter  values.  A  deficiency 
of  these  designs  is  their  reliance  on  expert  opinion  and  a  pri¬ 
ori  beliefs.  This  is  particularly  worrisome  in  complex  high¬ 
dimensional  problems  for  which  little  expert  consensus  exists. 
Srivastava’s  (1975)  search  linear  models  assume  that  the  num¬ 
ber  of  important  factors  is  known  a  priori.  This  would  be  a 
rather  rare  phenomenon,  especially  in  defense  analyses.  Fre¬ 
quency  domain  designs  (Schruben  1986)  are  capable  of  iden¬ 
tifying  complex  relationships  between  input  factors  and  output 
responses  but  use  nonterminating  simulations,  which  are  un¬ 
common  in  defense  analyses. 

Many  different  criteria  exist  for  constructing  designs.  Sant¬ 
ner  et  al.  (2003)  provided  a  thorough  description  of  entropy. 
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mean  squared  prediction  error,  and  multiple  optimization  crite¬ 
ria.  Johnson,  Moore,  and  Ylvisaker  (1990)  described  minimax 
and  maximin  distance  designs.  Potential  drawbacks  of  these 
methods  include  the  extensive  time  that  may  be  needed  to  con¬ 
struct  these  designs  based  on  the  specified  criterion,  and  the  fact 
that  orthogonality  may  not  exist. 

Two  promising  classes  of  designs  are  LH  designs — par¬ 
ticularly  the  orthogonal  variants — and  uniform  designs.  A  brief 
description  of  each  follows. 

In  LH  sampling,  the  input  variables  are  considered  to  be  ran¬ 
dom  variables  with  known  distribution  functions.  For  each  input 
variable  x,  ,  “all  portions  of  its  distribution  [are]  represented  by 
input  values”  by  dividing  its  range  into  “n  strata  of  equal  mar¬ 
ginal  probability  1  /n,  and  [sampling]  once  from  each  stratum” 
(McKay,  Beckman,  and  Conover  1979).  In  practice,  many  ana¬ 
lysts  take  a  fixed  value  in  each  stratum  (e.g.,  the  median)  rather 
than  a  random  value  or  sample  from  a  discrete  uniform  distrib¬ 
ution  (see  Koehler  and  Owen  1996).  For  each  x/,  the  n  sampled 
input  values  are  assigned  at  random  to  the  n  cases,  with  all  n\ 
possible  permutations  being  equally  likely.  This  determines  the 
column  in  the  design  matrix  for  x,  and  is  done  independently 
for  each  of  the  k  input  variables.  Therefore,  for  each  variable 
Xi,  all  of  the  n  input  values  appear  once  and  only  once  in  the 
design.  Also,  for  a  given  row  in  the  design  matrix,  all  of  the 
n*  potential  combinations  of  the  input  variable  values  have  an 
equal  chance  of  occurring.  A  great  strength  of  basic  LHs  is  that 
they  are  easy  to  generate  for  any  k  and  n;  indeed,  many  simu¬ 
lation  software  packages  include  them.  When  the  input  factors 
are  given  uniform  distributions,  LHs  tend  to  have  reasonable 
space-filling  properties.  In  fact,  when  a  discrete  uniform  is  used 
on  the  input  variables,  the  one-dimensional  projections  are  op¬ 
timum  space-filling  designs.  Furthermore,  when  the  number  of 
input  combinations  («)  is  sufficiently  large  with  respect  to  the 
number  of  input  variables  (k),  there  likely  will  be  small  corre¬ 
lations  between  columns  in  the  design  matrix  (see  Owen  1994; 
Lucas,  Sanchez,  Brown,  and  Vinyard  2002).  When  n  is  not  large 
with  respect  to  k,  somewhat  restrictive  OLHs  do  exist;  we  dis¬ 
cuss  this  later. 

Fang,  Lin,  Winker,  and  Zhang  (2000a)  defined  a  uniform  de¬ 
sign  as  one  “that  allocates  experimental  points  [which  are]  uni¬ 
formly  scattered  on  the  domain”;  that  is,  they  are  good  space¬ 
filling  designs.  Uniform  designs  can  be  particularly  useful  when 
identifying  thresholds  or  fitting  nonparametric  surfaces  (Fang 
and  Wang  1994).  However,  these  designs  do  not  require  orthog¬ 
onality  and  can  be  difficult  to  obtain;  that  is,  they  are  difficult  to 
generate,  and  relatively  few  situations  are  cataloged. 

2.3  Measures  for  Assessing  Designs 

Our  objective  is  to  generate  and  catalog  designs  that  are 
(1)  readily  available,  like  LHs;  (2)  are  orthogonal,  like  frac¬ 
tional  factorial  designs,  or  at  least  “nearly  orthogonal,”  and 
(3)  have  good  space-filling  properties,  like  uniform  designs.  We 
use  two  measures  to  assess  both  space-filling  and  degree  of 
nonorthoganility,  allowing  for  enhanced  discrimination  when 
choosing  among  candidate  designs.  In  particular,  ties  in  one 
measure  are  often  broken  by  the  other  measure. 

An  orthogonal  design  is  desirable  because  it  gives  uncorre¬ 
lated  estimates  of  the  coefficients  in  a  linear  regression  model 


and  enhances  the  performance  of  many  other  procedures,  such 
as  classification  and  regression  tree  models  (Kim  and  Loh 
2003).  Unless  specifically  stated  otherwise,  when  we  refer  to  a 
design  matrix,  we  mean  only  the  matrix  composed  of  k  columns 
(one  for  each  unique  input  variable,  i.e.,  not  including  functions 
of  the  input  variables,  like  quadratics)  and  n  rows  (which  spec¬ 
ify  the  levels  of  values  taken  by  the  input  variables). 

Our  first  degree  of  nonorthogonality  measure  is  the  maxi¬ 
mum  |py|,  over  all  i,j  such  that  i  ^  j,  with  py  the  pairwise  cor¬ 
relation  between  columns  x,  and  xj.  We  refer  to  this  measure 
as  the  maximum  pairwise  correlation  and  denote  it  as  Pmax- 
The  second  measure  of  orthogonality  is  a  condition  number 
of  X^X,  where  X  is  the  n  x  k  design  matrix.  Condition  num¬ 
bers  are  commonly  used  in  numerical  linear  algebra  to  examine 
the  sensitivities  of  a  linear  system  (Golub  and  Van  Loan  1983). 
They  also  can  reveal  the  degree  of  nonorthogonality  for  a  candi¬ 
date  design  matrix.  The  condition  number  that  we  use  is  defined 
as  cond(X^X)  =  where  and  \lr„  are  the  largest  and 

smallest  eigenvalues  of  X^X  after  the  columns  of  X  are  cen¬ 
tered  to  sum  to  0  and  scaled  to  the  range  [— 1 , 1  ].  A  cond(X^X) 
value  of  1  indicates  an  orthogonal  design  matrix.  A  large  con¬ 
dition  number  indicates  that  the  candidate  design  matrix  may 
be  ill-conditioned.  We  seek  a  condition  number  as  close  to  1  as 
possible. 

There  is  not  a  one-to-one  correspondence  between  p^ax  and 
cond(X^X),  but  the  condition  number  is  related  to  the  number 
of  the  pairs  of  columns  that  are  correlated  and  to  the  magni¬ 
tudes  of  the  correlations.  One  measure,  pmax.  gives  the  worst 
correlation  between  design  matrix  columns,  whereas  the  other 
measure,  cond(X^X),  assesses  the  overall  degree  of  nonorthog¬ 
onality  of  the  design  matrix.  A  design  matrix  will  be  classified 
as  nearly  orthogonal  if  it  has  a  maximum  pairwise  correlation 
no  greater  than  .03  and  a  condition  number  no  greater  than  1.13. 
Although  these  values  are  somewhat  arbitrary,  designs  meeting 
them  suffer  minimal  collinearity  effects,  and  we  show  that  good 
space-filling  designs  exist  with  this  degree  of  nonorthogonality. 

The  two  measures  that  we  use  to  assess  the  space-filling  of  a 
design  matrix  are  the  modified  L2  discrepancy  and  the  Euclid¬ 
ean  maximin  distance.  In  uniform  design  theory,  the  Loo  dis¬ 
crepancy,  equivalent  to  the  Kolmogorov-Smimov  statistic,  is 
usually  used  to  assess  the  space  filling  of  a  design  (Fang  and 
Wang  1994).  Fang  et  al.  (2000a)  stated  that  “this  is  probably 
the  most  commonly  used  measurement  for  discrepancy. . .  and 
has  been  universally  accepted  in  quasi-Monte  Carlo  methods 
and  number  theoretic  methods.”  Unfortunately,  as  they  noted, 
“one  disadvantage  of  [this]  measure  is  that  it  is  expensive  to 
compute.”  When  the  Lqo  discrepancy  is  too  computationally 
burdensome,  as  for  the  designs  considered  herein,  the  modified 
L2  discrepancy  [A/L2;  eq.  (3),  where  designs  are  normalized 
to  [0, 1]  in  each  dimension],  is  often  used  instead  (Hickemell 
1998;  Fang,  Ma,  and  Winker  20()0b).  This  article  uses  ML2  dis¬ 
crepancy  to  measure  the  space  filling  of  a  design,  with  smaller 
values  preferred  over  large  ones. 


2I-* 


n 


En(3-4) 


d=l  i=l 


J  n  n 

-max(xrf,-,xy,)].  (3) 

”  d=i  j=\  1=1 
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The  second  measure  is  the  Euclidean  maximin  (Mm)  dis¬ 
tance  (Johnson  et  al.  1990;  Morris  and  Mitchell  1995).  For  a 
given  design,  define  a  distance  list  d  =  (di ,  ^2.  •  •  • .  </[«(«- 1)]/2). 
where  the  elements  of  d  are  the  Euclidean  distances  between  the 
n  design  points,  ordered  from  smallest  to  largest.  The  Euclidean 
Mm  distance  is  defined  as  di ;  a  larger  value  is  better.  A  large 
value  of  d\  means  that  no  two  points  are  within  d\  of  each  other. 
Note  that  the  Mm  distances  that  we  report  here  are  for  designs 
scaled  to  the  domain  [—  1 ,  1  ]*. 

3.  ORTHOGONAL  LATIN  HYPERCUBES  WITH 
ADDITIONAL  VARIABLES 

The  designs  we  develop  in  this  article  build  on  previous  work 
in  generating  “good”  LHs.  The  random  elements  in  the  con¬ 
struction  of  LHs  means  that  any  given  realization  may  have 
poor  design  properties,  for  example,  a  high  pmax-  This  is  par¬ 
ticularly  likely  when  k  and  n  are  small.  A  common  solution 
is  to  generate  many  random  LHs  and  select  the  one  with  the 
best  properties,  such  as  the  minimum  pmax-  Others  (e.g.,  Flo- 
rian  1992;  Owen  1994)  have  developed  algorithms  that  may  re¬ 
duce  the  off-diagonal  correlations  of  a  LH  design  matrix.  Ye 
(1998a)  went  farther,  deriving  a  method  that  generates  OLHs 
(i.e.,  pmax  =  0),  when  the  number  of  runs  is  a  power  of  2  plus 
1  (the  plus  1  is  the  center  point).  Specifically,  for  any  integer 
m>  1,  his  technique  builds  OLHs  for  k  variables  such  that  the 
number  n  of  runs  is  related  to  k  and  m  by 

n  =  2'"-|-l  and  k  =  2m-2.  (4) 

One  limitation  of  Ye’s  procedure  is  that  too  few  factors  can 
be  varied  in  the  design,  especially  for  n  >  33.  Ye  (1998b)  noted 
that  it  is  possible,  given  n  in  (4),  “that  more  variables  can  be 
accommodated.”  Indeed,  he  listed  a  17  x  8  OLH  and  provided 
the  defining  structure  for  a  33  x  9  OLH.  However,  Ye  (1998b) 
remarked  that  for  m>6,  “we  are  not  able  to  [construct]  an 
OLH. . .  the  maximum  number  of  colunms  are  still  given  by 
[our  eq.  (4)].” 

In  this  section  we  extend  Ye’s  procedure  to  construct  OLH 
designs  (for  k  <  67)  that  can  accommodate  more  variables  for 
a  given  sample  size  when  m  >  5.  We  conjecture  that  this  is  also 
true  for  k  >  67.  These  orthogonal  designs  build  directly  from 
Ye’s  (1998a)  OLH  construction.  Specifically,  the  design  matrix 
is  augmented  with  additional  columns,  thus  permitting  a  greater 
number  of  variables  in  an  orthogonal  design  matrix  with  the 
same  number  of  runs.  To  show  this,  we  first  detail  Ye’s  algo¬ 
rithm. 

In  developing  the  OLHs,  Ye  ( 1998a)  constructed  three  matri¬ 
ces.  One  matrix,  M,  has  its  columns  composed  of  permutations 
of  the  ordinal  values  of  the  positive  levels  of  the  variables.  The 
method  assumes  that  there  is  an  equal  number  of  negative  and 
positive  levels  for  the  variables.  A  second  matrix,  S,  attaches 
a  sign  to  the  levels  in  the  design  matrix;  it  is  like  a  two-level 
factorial  design  matrix  on  m  —  1  variables  containing  m  —  2  in¬ 
teraction  terms.  All  entries  in  S  are  ±1,  and  the  columns  are 
orthogonal  to  one  another.  The  third  matrix,  T,  is  the  element¬ 
wise  (or  Hadamard)  product  of  M  and  S.  A  mirror  image  of  T 
and  a  row  of  O’s  corresponding  to  the  center  point  are  then  ap¬ 
pended  to  the  original  T  to  create  an  OLH.  Details  now  follow. 
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The  dimensions  of  M  are  ^  x  k,  with  q  =  {n—  l)/2  being 
the  number  of  positive  levels  of  each  variable.  The  first  step 
in  constructing  M  is  to  create  a  vector  e,  any  ordering  of  the 
first  q  natural  numbers  (1,2,  ...,9).  One  colunm  in  M  is  e. 
Given  an  initial  e,  permutation  matrices  are  used  to  generate 
the  other  columns  of  M.  Specifically,  for  L  =  1 , 2, . . . ,  m  -  1 , 
create  q  x  q  permutation  matrices  A/,  as  follows.  With  I  the 
2x2  identity  matrix  and  R  =  ^  ^  j,  each  Az,  is  constructed 
by  the  following: 

Az.  =  I (8)  •  •  •  (8)1(8)R(8)---(8>R, 
m—l—L  L 

where  ®  denotes  the  Kronecker  product.  Additional  permuta¬ 
tion  matrices  are  then  created  by  multiplying  distinct  pairs  of 
the  permutation  matrices  Ai  through  A^  _  1  by  one  another. 

In  Ye’s  (1998a)  algorithm,  the  k  =  2/n  —  2  columns  of  M 
are  composed  of  e,  A,e  for  /  =  1, 2, . . . ,  m  -  1,  and  A/Am-  le 
for  /■  =  1 , . . . ,  /M  —  2.  The  modification  used  in  our  construc¬ 
tion  is  that  the  columns  in  M  are,  in  order,  the  vector  e,  the 
vectors  A/e,  for  /  =  1, 2, . . . ,  m  —  1,  as  before,  then  the  vectors 
A/AyC,  for  all  i  and  j  such  that  i  =  1, . . . ,  /n  —  2  and,  for  each 
ij=i+l,...,m-  1.  Thus,  Ye  used  only  m  —  2  of  the  (”2  *) 
possible  pairwise  combinations  of  the  permutation  matrices  Az, 
in  creating  M.  Our  construction  of  M  uses  all  of  the  pairwise 
combinations  of  the  Az,  matrices.  As  a  consequence,  in  our  con¬ 
struction,  not  all  permutations  of  e  generate  an  OLH. 

The  number  of  variables  that  can  be  examined  by  using  all 
pairwise  combinations  of  the  Az,’s  in  M  is  as  follows. 

Fact  I.  With  n  runs,  where  n  =  2™  -I-  1  and  m  is  an  integer 
>  1,  the  maximum  number  of  variables  that  can  be  examined  in 
a  LH,  using  all  original  and  pairwise  combinations  of  the  Az, 
matrices,  is  m  -I-  (""J *)• 

The  proof  is  by  construction.  The  vector  e  constitutes  one 
variable.  Each  Az,  yields  a  colunm  in  the  design  matrix,  giving 
another  m  —  1 .  Finally,  each  of  the  (“^ ' )  pairwise  combinations 
of  the  Az,  matrices  also  corresponds  to  a  column  in  the  design 
matrix. 

As  an  example,  for  wi  =  4,  n  =  17,  and  k  =  7,  with  e  = 
[1,2,...,  8]^,  our  corresponding  M  is  as  follows; 
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Aje 

AsO 

A^Age 

AiAae 

1 
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4 
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8 
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3 

1 

6 

5 
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2 
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3 

8 

7 

5 

1 

6 

2 

4 

For  the  matrices  M  and  S  to  be  conformable  for  elementwise 
products,  Ye’s  S  also  must  be  given  additional  columns.  In  our 
construction,  the  first  column  of  S,  labeled  j,  consists  of  -fl, 
repeated  q  times,  the  next  m  —  1  columns  of  S  are  the  colunms 
used  to  estimate  the  main  effects  in  a  2'"“'  full  factorial  design 
matrix,  and  the  remaining  ("^ ')  colunms  of  S  are  the  colunms 
used  to  estimate  pairwise  interactions  in  a  2'"“'  full  factorial 
design  matrix.  The  latter  can  be  obtained  by  multiplying,  el¬ 
ement  by  element,  the  relevant  pair  of  main  effect  colunms. 
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Specifically,  we  extend  Ye’s  approach  as  follows.  For  k  = 
1, . . . ,  m—  1,  define  the  vector  as  a;^  =  Bi  ®B2®  •  • 
where  B^-*  =  [  ”/  ]  and  B,-  =  [ }  ]  fori  yt  m  -  k.  The  resulting 

m  +  (”2 ')  columns  of  S  are  j,  a,-  for  i  =  1 , . . . ,  m  —  1  and  a, a; 

for  i  =  1, . . . , ffi  -  2,7  =  I ■+  1 . m  -  1.  In  Ye’s  construction, 

the  aja/  columns  are  restricted  to  /  =  1  and  7  =  2, . . . ,  m  —  1. 

Continuing  the  previous  example,  our  corresponding  S  is  as 
follows: 
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*1*2 

*2*3 
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-1 

-1 

-Fl 

-Fl 

-Fl 

+1 

-Fl 
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-1 
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-1 

-F1 

-1 

-1 

-F1 

-1 
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-Fl 

-Fl 

-1 

-Fl 
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-1 

-F1 

-1 

-1 

-Fl 

-Fl 
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-1 
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-1 
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-1 

-FI 

-1 

-Fl 

-Fl 

-1 

-1 

-Fl 

-FI 

-Fl 

-Fl 

-F1 

-Fl 

-Fl 

-Fl 

The  matrix  T  is  the  elementwise  product  of  M  and  S.  The  de¬ 
sign  matrix  is  completed  by  augmenting  T  with  its  mirror  image 
and  the  center  point,  resulting  in  a  17  x  7  LH.  We  represent  an 
OLH  by  OjJ,  where  n  represents  the  number  of  runs  or  experi¬ 
ments  and  k  represents  the  number  of  variables.  The  resulting 
design  is  as  follows: 


Run 

Variable 

A 

Variable 

B 

Variable 
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2 

16 
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8 
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-2 

5 

1 
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17 

-8 

-7 

-5 

-1 

-6 

-2 
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Whether  the  resulting  (2'"  -|- 1)  x  (m  -f  ('"j'))  LH  is  orthog¬ 
onal  depends  on  the  choice  of  e.  In  the  case  where  n  =  17 
and  k  =  7,  there  are  8!  possible  permutations  of  e,  resulting  in 
40,320  possible  M  matrices.  A  complete  enumeration  reveals 
143  distinct  designs.  Unfortunately,  each  of  the  143 
designs  has  an  Mm  distance  of  1.47902;  thus  if  only  this  mea- 
sme  is  used,  then  there  is  no  space-filling  distinction  between 
the  designs.  Therefore,  we  also  consider  the  ML2  discrep¬ 
ancies,  which  range  from  .151854  to  .173952.  The  O]’’  design 
generated  using  e  =  [1, 2, . . . ,  8]^  has  an  ML2  discrepancy  of 
.173223  (almost,  but  not  quite,  the  worst  ML2  discrepancy).  The 
choice  of  e  corresponding  to  the  minimum  (i.e.,  preferred)  ML2 
discrepancy  is  e  =  [1, 2, 8, 4, 5, 6, 7, 3]^.  The  two-dimensional 
projections  of  the  variables  with  the  best  space-filling  0\^,  as 
measured  by  ML2  discrepancy,  are  shown  in  Figure  1. 

For  any  e,  Ye’s  OLH  construction  guarantees  orthogonal  de¬ 
signs  for  n  and  k  as  specified  by  (4).  Using  Proposition  1 ,  OLHs 
can  be  constructed  for  the  number  of  variables  specified  in 
Fact  1. 
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Proposition  '!.  If  the  matrices  M,  S,  and  T,  constructed  as 
described  earlier,  use  e  =  [1, 2, . . . ,  qV,  where  q  represents  the 
number  of  positive  levels,  to  generate  a  LH  (for  up  to  m  =  12, 
i.e.,  it  =  67  and  n  =  4,097),  then  the  resulting  LH  is  orthogonal. 

The  proof  is  by  computational  verification.  That  is,  we  have 
used  this  method  to  construct  an  OLH  for  all  choices  between 
2  and  67  variables.  It  is  conjectured  that  Proposition  1  applies 
for  any  (positive  integer)  value  of  m. 

Table  1  compares  the  number  of  variables  that  can  be  exam¬ 
ined  using  Ye’s  designs  and  the  extended  orthogonal  designs. 
As  with  Ye’s  designs,  these  extended  orthogonal  designs  have 
the  elementwise  square  of  each  column  orthogonal  to  all  of 
the  columns  in  the  design  matrix,  and  the  elementwise  prod¬ 
uct  of  every  two  columns  orthogonal  to  all  columns  in  the  de¬ 
sign  matrix.  Thus  estimates  of  the  linear  effects  [coefficients  /S,-, 
for  I  =  1 , . . . ,  it  in  eq.  (2)]  are  unconelated  with  the  estimates 
of  quadratic  and  bilinear  interaction  effects  (coefficients  ,  for 
i  =  k-‘t-\,...,2k  and  for  i  =  1 , . . . ,  A:  —  1 ,7  =  1  -F  1 , . . . , 
in  eq.  (2)].  However,  in  both  oiir  designs  and  Ye’s  designs,  es¬ 
timates  of  quadratic  and  interaction  effects  in  (2)  can  be  (and 
usually  are)  correlated  with  one  another. 

Table  1  shows  that  as  the  number  of  levels  («)  doubles  (less  1, 
for  the  center  point),  Ye’s  designs  can  accommodate  exactly 
two  more  variables.  In  the  new  designs,  as  the  number  of  levels 
doubles,  the  corresponding  maximum  number  of  variables  in¬ 
creases  by  the  previous  m.  This  difference  grows  dramatically 
as  the  number  of  variables  to  be  explored  increases.  For  exam¬ 
ple,  Ye’s  approach  requires  more  then  16  million  runs  to  build 
an  OLH  for  46  variables. 

4.  CONSTRUCTING  AND  CATALOGING  NEARLY 
ORTHOGONAL  LATIN  HYPERCUBES 

Using  Proposition  1,  2'"  -F  1  by  m  -F  OLHs  exist  for 
jt  >  7;  however,  their  space-filling  properties  are  often  quite 
poor.  By  relaxing  the  requirement  of  orthogonality  and  con¬ 
sidering  LH  designs  that  are  orthogonal  for  at  least  one  per¬ 
mutation  of  e,  we  can  obtain  nearly  orthogonal  designs  that 
have  dramatically  better  space-filling  properties  than  the  OLHs 
constructed  using  Proposition  1.  Toward  this  end,  we  present 
a  computationally  intensive  algorithm  that  produces  nearly 
orthogonal  Latin  hypercube  (NOLH)  designs  with  improved 
space-filling  properties. 

The  heuristic  optimization  used  to  generate  the  designs  is 
described  in  detail  later.  Briefly,  we  screen  millions  of  ran¬ 
dom  LHs  for  good  correlation  properties.  For  the  random  LHs 
with  the  best  correlation,  Florian’s  (1992)  correlation  reduction 
method  is  iteratively  applied,  (the  App.  provides  for  details  on 
Florian’s  method.)  A  large  number  of  random  LHs  is  screened 
because  Florian’s  procedure  provides  only  limited  improve¬ 
ment  in  pmax  and  cond(X^X).  By  extensive  exploratory  calcu¬ 
lations,  we  have  found  that  screening  for  Pmax  and  cond(X^X) 
speeds  the  process  and  enhances  the  nonorthogonality  mea¬ 
sures  of  the  final  design  matrix.  Of  those  candidate  designs  that 
satisfy  the  near-orthogonality  constraints  of  a  maximum  pair- 
wise  correlation  no  greater  than  .03  and  a  condition  number  no 
greater  than  1.13,  the  design  with  the  smallest  rank  sum  of  the 
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Figure  1.  Two-Dimerisional  Projections  of  the  Best  Space-Fitting  oy  Design  With  Ait  of  the  Variabtes  Seated  to  Range  From  -1  to  1. 


Euclidean  Mm  distance  and  ML2  is  selected  and  cataloged.  This 
optimization  problem  is  represented  as 

minimize  /(Mm,  ML2), 

subject  to  pmax  <  03  (5) 

cond(X^X)<  1.13. 

The  constraints  in  (5)  require  a  nearly  orthogonal  design.  The 
objective  function,  /,  that  we  minimize  is  the  rank  sum  of  the 
two  space-filling  properties  for  the  designs  that  meet  the  con¬ 
straints. 

Experimental  designs  with  near  orthogonality  are  denoted  by 
where  N  represents  near  orthogonality,  n  is  the  number  of 
runs  or  experiments,  and  k  is  the  number  of  variables.  Our  al¬ 
gorithm  for  finding  NOLH  experimental  designs  comprises  the 
following  steps: 

Step  1.  Determine  the  number  of  variables  (k  >  7)  required. 
If  the  number  of  variables  is  other  than  11,  16,  22, 
or,  more  generally,  (/n  -|-  ('"J*)),  then  round  the  re¬ 
quired  number  of  variables  up  to  the  nearest  of  these 
numbers. 


Step  2.  Establish  a  maximum  threshold  pairwise  correlation 
value  and  a  maximum  threshold  condition  number. 
Based  on  extensive  experimentation,  we  use  pmax  = 
.05, .  17, .  16  and  cond(X^X)  =  1.15, 2.4, 2.8  for  k  = 
11, 16,  and  22. 

Step  3.  Using  a  randomly  permuted  e,  construct  a  design 
matrix  as  described  in  Section  3. 

Step  4.  Calculate  the  pairwise  correlations  and  the  condition 
number  of  the  candidate  matrix. 

Step  5.  If  either  value  in  Step  4  exceeds  the  thresholds  in 
Step  2,  then  discard  the  design  and  return  to  Step  3  to 
regenerate  with  another  randomly  permuted  e.  Oth¬ 
erwise,  keep  the  design  and  proceed  to  Step  6.  Re¬ 
peat  Steps  3-5  until  a  desired  number  of  candidate 
designs  are  found.  If  not  enough  are  found,  relax  the 
criteria  in  Step  2  and  begin  again.  We  have  found  that 
15  candidate  designs  works  well  for  Steps  6-8. 

Step  6.  Subject  each  of  the  candidate  designs  to  repeated  ap¬ 
plications  of  Florian’s  method  to  decrease  the  max¬ 
imum  pairwise  correlation  and  condition  numbers. 
Stop  when  no  further  improvement  is  noted. 


Table  1.  A  Comparison  Illustrating  the  Increased  Number  of  Variables  That  Can  Be  Examined  by  Extending  Ye’s  (1998a,  1998b) 

Construction  Algorithm  for  OLHs 


Total  number  of 

Maximum  number  of 

Number  of  variables 

Maximum  number  of 

levels  for  each 

variables  (k)  for  Ye's 

(k)  in  OLH  identified 

variables  (k)  by 

variable  (n) 

m 

(1998a)  OLH  method 

by  Ye  (1998b) 

extending  Ye's  OLH 

17 

4 

6 

8 

7 

33 

5 

8 

9 

11 

65 

6 

10 

NA 

16 

129 

7 

12 

NA 

22 

257 

8 

14 

NA 

29 

513 

9 

16 

NA 

37 

1,025 

10 

18 

NA 
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Step  7.  Calculate  the  Mm  distance  and  ML2  discrepancy  for 
each  of  the  Step  6  designs.  Rank  the  designs  ac¬ 
cording  to  some  combination  of  these  measures.  We 
chose  the  design  with  the  minimum  rank  sum  over 
the  two  measures. 

Step  8.  If  a  number  of  variables  other  than  7,  1 1,  16,  22,  or 
(m  -f-  is  required,  then  construct  each  possi¬ 
ble  subset  having  the  appropriate  number  of  columns 
from  the  Step  7  design  and  calculate  the  Mm  dis¬ 
tance  and  ML2  discrepancy.  Choose  the  design  with 
the  best  combination  of  these  two  measures. 

By  applying  these  steps,  the  best  and  de¬ 

signs  (as  of  this  writing)  have  been  identified;  the  complete 
designs  have  been  given  by  Cioppa  (2002).  For  each  of  these 
cases,  we  identified  15  designs  that  satisfy  our  definition 
of  near-orthogonality.  The  one  with  the  best  space  filling — 
determined  as  the  minimum  rank  sum  of  Mm  distance  and  ML2 
discrepancy — ^was  selected  as  the  best  NOLH  by  the  procedure 
proscribed  earlier.  Table  2  compares  the  best  NOLH  design  to 
the  same-sized  OLH  and  an  average  LH  designs.  The  average 
LH,  denoted  “mean  LH(k,  n),”  is  empirically  determined  by 
generating  1,000  random  LHs  for  the  specified  n  and  k  and  av¬ 
eraging  the  measures.  Table  2  shows  that  the  best  NOLH  has 
much  better  space-filling  than  the  OLH,  at  the  expense  of  only 
a  slight  departure  from  orthogonality.  Moreover,  we  see  that  in 
the  ranges  of  n  and  k  examined,  the  NOLHs  are  vastly  supe¬ 
rior  to  a  random  discrete  uniform  LHs.  Finally,  the  best  NOLH 
has  better  space-filling  properties  than  an  average  random  LH, 
which  is  superior  to  the  OLH. 

Except  for  the  design,  there  is  no  guarantee  that  the 
designs  generated  from  this  algorithm  are  optimal  among  de¬ 
signs  created  using  the  foregoing  algorithm  (i.e.,  over  all  per¬ 
mutations  of  e).  However,  these  designs  are  nearly  orthogonal 
and  typically  have  much  better  space-filling  properties  than  the 
orthogonal  or  random  LHs.  The  designs  are  readily  available; 
recommended  designs  for  from  2  to  22  variables  and  17  to  129 
runs  have  been  given  by  Cioppa  (2002).  An  easy-to-use  spread¬ 
sheet  with  this  family  of  designs  also  has  been  made  available 
by  Lucas  and  Sanchez  (2005).  This  algorithm  has  many  ad  hoc 
features  in  this  algorithm.  Many  other  approaches  were  tried: 
the  one  described  above  produces  good  designs  in  a  reasonable 
amount  of  time. 


Table  2.  Comparing  NOLHs  to  OLHs  and  Discrete  Uniform  LHs  With 
Respect  to  Our  Space-Riling  and  Nrmorthogonality  Measures 


Design 

Max  pairwise 
correlation 

Condition 

number 

Mm 

distribution 

MLz 

0 

1 

1.671 

.95 

Best 

.0234 

1.123 

1.758 

.73 

Mean  LHC(11, 33) 

.4401 

8.671 

1.295 

1.01 

0 

1 

1.794 

7.98 

Best  N^ 

.0219 

1.103 

2.035 

4.46 

Mean  LHC(16, 65) 

.3194 

6.103 

1.647 

5.37 

0^ 

0 

1 

1.789 

96.6 

Best/V^ 

.0015 

1.036 

2.265 

37.8 

Mean  LRC(22, 129) 

.2332 

4.073 

1.899 

59.8 

5.  EXPANDING  THE  SET  OF  READILY  AVAILABLE 
NEAR-ORTHOGONAL  LATIN  HYPERCUBE  DESIGNS 

We  developed  the  designs  constructed  in  Sections  3  and  4  for 
specific  n  and  k  combinations.  In  this  section  we  illustrate  how 
effective  the  designs  (obtained  using  Step  8  from  the  previous 
section)  with  fewer  variables  can  be,  and  show  a  useful  way  to 
build  off  of  the  existing  designs  if  taking  more  runs  is  feasible. 

5.1  Designs  With  Fewer  Variables 

As  mentioned  in  Step  8  of  the  algorithm  (see  Sec.  4),  we  may 
require  a  number  of  variables  other  than  7,11,16, 22,  and  so  on. 
Of  course,  when  we  delete  columns,  the  new  (smaller)  design 
is  a  subset  of  a  design  with  nearly  orthogonal  columns.  There¬ 
fore,  collinearity  will  not  be  an  issue,  and  consequently,  we  fo¬ 
cus  on  space-filling  properties  when  selecting  which  columns 
to  delete.  Rather  than  construct  a  design  by  other  methods,  we 
assume  that  a  design  obtained  by  eliminating  columns  from  the 
algorithmically  developed  (Steps  1-7)  design  will  result  in  a 
design  with  good  space-filling  properties.  One  test  of  this  ap¬ 
proach  on  the  goodness  of  our  design’s  space-filling  properties 
is  found  for  designs  with  2  variables  and  17  levels.  Specifically, 
Table  3  compares  three  designs:  (1)  the  design,  derived 
by  taking  the  two  columns  with  the  top  space-filling  proper¬ 
ties  from  the  best  O7’  design  (see  Fig.  1);  (2)  the  published 
optimal  uniform  design  of  Fang  and  Wang  (1994);  and  (3)  the 
design  with  the  best  Mm  distance  measure,  taken  from  Morris 
and  Mitchell  (1995). 

The  space-filling  measures  of  the  design  are  nearly  equal 
to  those  of  the  designs  with  optimum  space-filling  properties. 
Indeed,  the  Oj’  has  an  ML2  discrepancy  only  15%  higher  than 
that  of  the  optimal  uniform  design  and  an  Mm  distance  about 
3%  lower  than  that  of  the  best  maximin  design.  In  fact,  the 
design  has  a  substantially  better  Mm  distance  than  the  uniform 
design.  It  is  noteworthy  that  all  of  these  designs  have  zero  or 
minimal  correlations.  Our  experience  is  that  when  n  is  large 
relative  to  k,  designs  with  good  space-filling  properties  often 
are  close  to  being  orthogonal. 

The  situation  analyzed  in  Table  3,  with  n  =  17  and  k  =  2, 
is  not  a  situation  in  which  a  sophisticated  design  is  typically 
needed.  Indeed,  a  two  factor,  four-level  full-factorial  design, 
augmented  with  a  center  point,  is  orthogonal  and  has  reason¬ 
able  space-filling  properties  (with  an  Mm  distance  of  .471  and 
an  ML2  discrepancy  of  .0295).  This  example  was  selected  for 
comparison  because  there  are  very  few  combinations  of  n  and  k 
for  which  optimum  space-filling  designs  are  readily  available, 
especially  for  large  n  and  k.  This  is  the  only  condition  that  we 
have  found  in  which  we  can  make  a  direct  comparison  between 


Table  3.  Comparison  of  the  Best  Uniform,  and  Mm  Distance 
Designs  for  the  17-Run,  2-Variable  Case 


Maximum  pairwise 
correlation 

Condition 

number 

Mm 

distribution 

ML2 

(0)J^  design 

0 

1 

.515 

.0025 

Uniform  design 
Best  Mm  distance 

0 

1 

.279 

.0022 

design 

.0588 

1.125 

.530 

.0024 
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Table  4.  A  Comparison  of  Space-Filling  and  Nonorthogonality 
Measures  for  Nine-Level  Designs  With  (about)  33  Levels 


Design 

Max  pairwise 
correlation 

Condition 

number 

Mm 

distribution 

ML2 

Best 

.023 

1.100 

1.512 

.229 

Mean  LHC(9,  33) 

.413 

6.085 

1.030 

.347 

Ye  best  Mm  dist  0^ 

0 

1 

1.720 

.215 

f-27  Taguchi  array 

0 

1 

2.236 

2.806 

our  designs  with  fewer  variables  and  published  optimal  Mm 
distance  and  uniform  designs. 

A  more  interesting  comparison  is  between  the  suggested 
Np  from  Cioppa  (2002)  and  similar-sized  alternative  designs 
that  we  might  consider.  Table  4  shows  the  nonorthogonality 
and  space-filling  measures  of  the  recommended  against 
the  mean  of  1,(XX)  9-factor,  33-level  random  LHs,  Ye’s  (2(X)5) 
“33  X  9  maximin  distance  OLH,”  and  a  9-factor  (each  with  3 
levels),  27-run  Taguchi  array  generated  by  using  the  JMP  Statis¬ 
tical  Discovery  software  (SAS  Institute  Inc.  2004).  We  see  that 
the  reduced  variable  is  greatly  preferred  to  a  typical  random 
LH.  It  also  performs  quite  well,  although,  of  course,  not  quite  as 
good  as  Ye’s  maximin  distance  OLH.  A  traditional  three-level 
design,  such  as  the  L27  produced  by  JMP,  has  a  sparse  inte¬ 
rior  and  thus  has  an  ML2  discrepancy  substantially  worse  than 
those  of  any  of  the  LHs  and  uniform  designs.  Finally,  although 
not  listed  in  the  table  (because  the  full  design  is  not  available). 
Fang  (2005)  gave  the  ML2  discrepancies  for  uniform  designs 
with  up  to  29  factors  and  up  to  30  runs.  The  9-factor,  30-run 
design  has  an  ML2  discrepancy  of .  1 80. 

5.2  Generating  Additional  Design  Points 

It  may  be  desirable  to  add  design  points  to  the  original  design 
matrix  so  as  to  improve  the  design’s  space-filling  properties  and 
maintain  orthogonality  (in  the  case  of  seven  or  fewer  variables) 
or  near  orthogonality  (for  more  than  seven  variables).  The  ad¬ 
ditional  points  can  be  used  to  test  how  well  a  meta-model  fit  to 
the  original  data  predicts  the  new  observations.  It  is  possible  to 
add  the  second-best  design  from  the  foregoing  process  to  the 
original  design,  then  the  third-best  design,  and  so  on.  A  much 
simpler  alternative  permutes  columns  of  the  “best  design’’  and 
appends  them  to  the  “best  design.”  Permuting  the  columns  of  a 
design  matrix  does  not  affect  its  space-filling  measures  or  de¬ 
gree  of  nonorthogonality.  Thus  this  approach  reuses  all  of  the 
effort  that  went  into  generating  the  best  space-filling  NOLH  and 
requires  that  only  the  best  design  be  cataloged.  The  center  point 
run  is  redundant  and  thus  is  not  repeated.  Therefore,  if  n  was  the 
number  of  runs  in  the  initial  design  matrix,  then  the  appended 
design  adds  «  —  1  runs.  Proposition  2  gives  the  encouraging 
result  that  the  maximum  pairwise  correlation  in  the  appended 
design  is  less  than  or  equal  to  what  it  is  in  the  original  design. 
Note  that  because  points  are  added  to  the  original  region  with¬ 
out  an  increase  in  dimensionality,  the  ML2  discrepancy  usually 
decreases,  and  the  Mm  distance  is  nonincreasing. 

Proposition  2.  By  permuting  the  columns  of  the  original 
NOLH  containing  n  runs  and  appending  these  columns  to  the 
original  NOLH,  the  number  of  runs  is  increased  to  (2n  —  1 ),  and 
the  maximum  pairwise  correlation  is  nonincreasing. 
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Proof.  The  correlation  r(v,  w)  between  two  columns,  v 
and  w,  in  a  design  matrix  is  defined  as  r(v,  w)  =  [5^(v,-  — 
vlfw,  —  w)]/ y[XI(Vi  —  v)^  JZfH',-  —  vv)2].  Furthermore,  without 
loss  of  generality,  we  consider  the  absolute  value  of  r  (v,  w)  in 
determining  the  maximum  pairwise  correlation.  For  a  sample 
size  of  n,  the  values  in  the  columns  of  our  LHs  take  the  integer 
values  from  (— n  -|- 1)/2  to  (n  —  l)/2.  Thus,  for  any  column  v, 
V  =  0  and  v?  =  [(«  —  \)n(n  +  1)]/12.  Therefore,  for  any  two 
columns  of  v  and  w,  r(v,  w)  =  v,w,/[(n  -  l)n{n  -I-  1)/12]. 

Now  assume  that  the  columns  of  the  design  matrix  are  per¬ 
muted,  and  that  we  append  the  permuted  matrix  to  the  bot¬ 
tom  of  the  initial  design  matrix  to  create  a  new,  expanded 
design  matrix.  The  new  columns  consist  of  n  -I-  (n  —  1)  en¬ 
tries.  Suppose  that  columns  x  and  y  are  appended  to  v  and  w. 
Then  the  correlation  between  the  two  columns  is  rnew(v  :  x,  w : 
y)  =  v/vv,-  -I-  ^x,y,]/[(n  -  l)n(/i  -\- 1)/6].  Note  that  the  de¬ 
nominator  of  rnew(v :  X,  w :  y)  is  twice  that  of  r(v,  w).  Without 
loss  of  generality,  suppose  that  maximum  pairwise  correlation 
is  greater  than  or  equal  to  the  negative  of  the  minimum  pair¬ 
wise  correlation.  Moreover,  suppose  that  r(v,  w)  =  Anax;  then 
'•(X,  y)  <  r(v,  w).  Therefore,  rnew(v :  x,  w :  y)  <  r(v,  w). 

Because  the  original  experimental  design  is  nearly  orthog¬ 
onal,  the  maximum  pairwise  correlation  value  and  condition 
number  are  generally  improved  only  marginally.  Thus,  when  se¬ 
lecting  columns  to  permute,  it  seems  wise  to  emphasize  space¬ 
filling  properties.  In  the  design,  an  exhaustive  enumeration 
of  the  7!  column  permutations  is  possible.  In  finding  the  best 
permutation  of  columns  to  be  appended,  the  rank  sum  of  the 
Mm  distance  and  the  ML2  discrepancy  are  used  in  the  same  way 
as  was  done  above  when  seeking  columns  to  delete.  Exhaus¬ 
tive  enumerations  of  the  column  permutations  for  the  yv^|, 
and  ^22^  designs  are  not  feasible.  One  possibility  is  to  sample 
randomly  from  the  possible  permutations,  rank-order  the  result¬ 
ing  designs  for  their  Mm  distances  and  ML2  discrepancies,  and 
choose  the  permutation  design  with  the  smallest  rank  sum.  To 
do  this  more  efficiently,  we  use  a  heuristic  to  narrow  the  possi¬ 
ble  permutations  for  the  random  sampling.  The  objective  of  the 
heuristic  is  to  identify  variables  that  perform  well  and  poorly  on 
space-filling  properties,  so  poor  performers  can  be  appended  to 
good  performers. 

This  is  done  as  follows.  The  ML2  discrepancy  is  calculated 
for  each  combination  of  three  variables  [e.g.,  in  the  design, 
there  are  (’3)  =  165  combinations].  The  ML2  discrepancies  are 
then  rank-ordered  from  highest  (i.e.,  worst  space-filling)  to  low¬ 
est  (i.e.,  best  space-filling).  The  number  of  times  that  each  vari¬ 
able  appears  in  a  combination  with  a  high  ML2  discrepancy 
(in  the  design,  this  is  the  upper  82  of  the  165  measures) 
is  compared  with  the  number  of  times  that  each  variable  ap¬ 
pears  in  a  combination  having  a  low  ML2  discrepancy  (e.g.,  in 
the  design,  this  is  the  smallest  82  measures).  Under  the 
assumption  of  randomness  that  a  variable  has  the  same  likeli¬ 
hood  of  appearing  in  either  the  upper  half  or  lower  half  of  these 
combinations,  an  exact  binomial  test  (see  Conover  1999)  at  the 
.10  significance  level  is  performed  to  identify  variables  that  are 
more  likely  to  appear  in  the  better  combinations  and  those  that 
are  more  likely  to  appear  in  the  poorer  combinations.  For  exam¬ 
ple,  if  a  variable  appears  in  the  upper  half  94  times  and  in  the 
lower  half  70  times,  then  the  associated  p  value  from  the  exact 
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binomial  test  is  .072.  This  variable  is  designated  as  a  variable 
more  likely  to  appear  in  the  upper  half  (worst  space-filling).  The 
best-performing  variables  are  then  restricted  to  being  appended 
to  the  poorest-performing  variables  (e.g.,  the  aforementioned 
variable). 

Although  other  heuristics  are  possible,  this  one  allows  us  to 
quickly  find  additional  design  points  that  improve  on  both  the 
design’s  near-orthogonality  measures  and  space-filling  proper¬ 
ties.  A  significance  level  of  .10  is  chosen  (over,  say  .05)  to  per¬ 
mit  identification  of  a  greater  number  of  variables  as  good  and 
poor  performers.  This  heuristic  has  identified  good  (although 
not  necessarily  globally  optimal)  permutations,  whereas  ran¬ 
dom  sampling  has  not  found  a  better  permutation  in  a  much 
greater  number  of  attempts.  Cioppa  (2002)  gave  suggested  per¬ 
mutations  of  the  columns  of  the  design  matrices  to  append  to 
the  best  Oj’,  and  designs. 

6.  EXPLORATORY  ANALYSIS  OF  A  PEACE 
ENFORCEMENT  SCENARIO 

These  new  designs  were  developed  so  that  analysts  could 
conduct  computer  experiments  and  explore  the  output  flexi¬ 
bly.  This  section  summarizes  one  such  exploration  (see  Cioppa 
2002). 

Recent  years  have  brought  an  increased  emphasis  on  using 
military  forces  for  operations  other  than  war,  such  as  peace  en¬ 
forcement.  The  United  States  Army  Field  Manual  100-23  (De¬ 
partment  of  the  Army  1994)  describes  peace  enforcement  as 
“the  application  of  military  force  or  the  threat  of  its  use,  nor¬ 
mally  pursuant  to  international  authorization,  to  compel  com¬ 
pliance  with  generally  accepted  resolutions  or  sanctions.  The 
purpose  of  peace  enforcement  is  to  maintain  or  restore  peace 
and  support  diplomatic  efforts  to  reach  a  long-term  political 
settlement.”  Operations  of  this  nature  are  becoming  coimnon 
for  the  military.  Furthermore,  many  questions  exist  about  doc¬ 
trine  and  tactics  for  units  conducting  peace  enforcement  opera¬ 
tions. 

To  generate  hypotheses  about  light-infantry  platoon-level 
peace  enforcement  tactics,  we  explored  a  scenario  of  such  a  pla¬ 
toon  clearing  an  area  to  facilitate  United  Nations  (UN)  food  dis¬ 
tribution  and  military  convoy  operations  using  the  agent-based 
simulation  MANA  (Lauren  and  Stephen  2001).  This  scenario  is 
a  challenging  one,  because  the  Blue  force  (a  U.S.  Army  light- 
infantry  platoon)  is  subjected  to  a  series  of  encounters  with  a 
hostile  indigenous  Red  force  and  an  originally  nonhostile  Yel¬ 
low  force  that  turns  hostile  as  the  scenario  progresses.  This  sce¬ 
nario  has  been  deemed  doctrinally  correct  and  plausible  by  the 
U.S.  Army  Infantry  Simulation  Center  at  Fort  Benning,  Geor¬ 
gia.  To  examine  the  effects  of  unit  cohesion  and  agent  person¬ 
alities,  22  variables  (labeled  A-V)  were  identified  for  experi¬ 
mentation.  These  22  variables  were  selected  from  among  many 
available  variables,  and  their  levels  were  chosen  based  on  the 
author’s  military  experience  and  judgment  and  on  the  results 
of  hundreds  of  small,  interactive  experiments.  The  chosen  in¬ 
puts  control  the  entities’  movement  capabilities  and  personali¬ 
ties.  Personalities  refer  to  the  agents’  propensities  to  move  to¬ 
ward  or  away  from  scenario  objects,  such  as  friendly  agents, 
hostile  agents,  or  a  positional  goal.  These  variables  affect  the 


simulated  units’  speed,  cohesiveness,  and  aggressiveness.  (See 
Cioppa  2002  for  more  details  on  MANA  and  the  scenario.) 

As  is  often  the  case  with  DoD  exploratory  analysis,  the  model 
is  not  being  used  to  predict  potential  outcomes.  Indeed,  due  to  a 
lack  of  data,  the  model  cannot  be  empirically  validated.  Rather, 
the  model  is  used  to  help  us  devise  new  ideas  or  assess  the 
consequences  of  certain  assumptions.  Potential  insights  gleaned 
from  such  exploration  usually  need  to  be  tested  elsewhere,  per¬ 
haps  with  field  experiments. 

Because  22  continuous  factors  were  chosen,  the  best  space¬ 
filling  ^22^  design  was  used.  The  output  on  which  we  focus 
here  is  the  exchange  ratio  (ER),  the  number  of  Red  agents  at- 
tritted  divided  by  the  number  of  Blue  agents  attritted;  higher  is 
considered  better.  Due  to  the  high  variability  of  ER  for  given 
input  values  (e.g.,  coefficient  of  variation  typically  >.40),  the 
availability  of  substantial  computing  assets,  and  our  interest  in 
examining  output  variability,  for  each  of  the  129  input  combi¬ 
nations,  100  iterations  were  conducted  using  different  random 
seeds;  this  gave  12,900  individual  runs.  A  regression  equation 
representing  the  MANA  results  was  found  interactively  through 
trial  and  error,  using  visualization  and  forward  and  backward 
stepwise  selection  (to  include  quadratic  terms  and  two-variable 
interactions).  The  Akaike  information  criterion  (Akaike  1974), 
sum  of  squares,  and  residual  plots  were  the  primary  measures 
used  to  build  the  model.  Additional  design  points  (12,800) 
were  then  generated,  as  specified  in  Section  4.4,  as  a  way  to 
cross-validate  the  hypothesized  regression  equation.  This  is  in 
keeping  with  the  Department  of  Defense’s  policy  of  model- 
experiment-model  (Piplani,  Mercer,  and  Roop  1994).  Here  the 
m^el  is  our  meta-model.  After  building  the  meta-model,  we 
assessed  how  well  it  predicts  the  experiments  (in  MANA)  at 
new  input  combinations.  This  process  was  continued  until  the 
model  was  deemed  “good  enough.”  The  results  shown  in  Fig¬ 
ure  2  indicate  that  the  differences  between  the  predicted  re¬ 
sponses  and  the  actual  responses  were  acceptably  low  in  the 
opinion  of  military  subject  matter  experts  at  the  U.S.  Army  In¬ 
fantry  Simulation  Center  (Cioppa  2002).  Thus  the  first  and  sec¬ 
ond  experiments  are  combined,  and  a  regression  analysis  was 
executed  on  the  257  input  variable  combinations  and  25,700  to¬ 
tal  simulation  runs. 


Figure  2.  Predicted  Values  versus  Actual  Values  (second  experi¬ 
ment)  With  Least  Squares  Fitted  Line  (—)  and  Weighted  Least  Squares 
Line  (■ . )  tor  the  Mean  ERs. 
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The  resulting  model,  shown  in  (6),  has  an  of  .67.  The  fitted 

exchange  ratio  as  a  function  of  the  22  independent  variables  is 

ER  =  1.890  +  (1.928e-007)t/2  +  (.000457)B  +  (.000736)£: 

+  (.00237)F  +  (.00568)G  +  (.000826)/*  -  (.00898)// 

-  (.00327)  V  -  (4.866e-006)B//  -  (3.021e-005)Gt/ 

-  (2.688e-005)FV+  (1.378e-005)/y 

+  (2.225e-006)BiV.  (6) 

We  see  that  the  fitted  model  has  quadratic  and  interaction 
terms,  including  an  interaction  in  which  neither  variable  shows 
up  as  a  main  effect.  Guided  by  the  regression,  and  many  visual 
playbacks,  several  working  hypotheses  based  on  this  scenario 
can  be  gleaned,  including  the  following: 

•  Speed  of  execution  (C/)  and  precision  of  movement  (B)  are 
critical  to  Blue’s  success. 

•  When  the  Blue  elements  are  in  contact  with  the  threat,  they 
should  consider  moving  toward  other  firiendly  elements 
(variables  £,  F,  G) — ^that  is,  massing  its  forces. 

•  When  Blue  elements  have  taken  casualties  and  are  in  con¬ 
tact,  continuing  the  operation  instead  of  ceasing,  though 
simultaneously  trying  to  move  toward  other  friendly  units, 
may  be  advantageous  over  the  course  of  the  battle  (vari¬ 
ables  1,J,N,P). 

It  is  important  to  emphasize  that  in  this  exploratory  analysis, 
we  are  primarily  trying  to  identify  the  factors  that  have  a  sig¬ 
nificant  effect  on  the  exchange  ratio,  along  with  the  directions 
of  those  effects  and  which,  if  any,  factors  interact.  We  consid¬ 
ered  a  quadratic  to  be  sufficient  for  these  purposes.  However, 
for  purposes  of  prediction,  a  second-order  model  may  be  in¬ 
adequate;  for  example,  a  second-order  response  surface  has  at 
most  one  extremum.  Unless  one  had  reason  to  believe  that  the 
actual  computer  model  has  this  property,  perhaps  a  more  flex¬ 
ible  model  (e.g.,  spatial  models;  see  Cressie  1993)  would  be 
more  appropriate. 

An  unexpected  thing  that  came  out  of  the  exploration  was 
the  discovery  of  a  software  bug  over  a  relatively  narrow  range 
of  one  of  MANA’s  parameters.  In  particular,  the  input  control¬ 
ling  the  agent’s  movement  range  (//)  was  found  to  not  work 
properly  at  values  of  101-1 14.  Note  that  this  input  takes  integer 
values  between  0  and  200.  These  thresholds  were  easily  iden¬ 
tified  by  regression  trees.  Identifying  this  bug,  and  specifying 
the  range  of  the  problem,  would  have  been  essentially  impossi¬ 
ble  with  a  low-level  factorial  or  central  composite  design.  This 
discovery  was  reported  to  the  developers  of  the  model  and  il¬ 
lustrates  these  designs’  potential  in  verifying  models  as  well 
as  using  them.  This  example  illustrates  a  critical  advantage  of 
using  the  design  for  exploration  over  a  replicated  central 
composite  or  other  competing  design.  Specifically,  the  de¬ 
sign  provides  better  resolution  on  critical  thresholds  or  change- 
points.  As  such,  we  have  found  that  they  are  particularly  suit¬ 
able  for  generating  data  to  be  analyzed  with  regression  trees. 
(See  Cioppa  2002  for  more  on  this  analysis.) 

Although  the  foregoing  exploration  emphasized  regression 
and  visualization,  the  data  are  amenable  to  analysis  by  a  host 
of  methods.  For  example,  using  the  design,  Ipekci  (2002) 
applied  neural  networks,  regression  trees,  multiple  additive  re¬ 
gression  trees,  and  Bayes  nets  to  explore  the  relationships 
among  the  input  settings  and  the  outputs. 
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7.  CONCLUSIONS  AND  AREAS  FOR 
FURTHER  RESEARCH 

In  the  DoD  and  elsewhere,  we  are  increasingly  reliant  on 
computer  simulations  in  which  there  is  a  vast  space  of  possible 
computational  experiments.  In  selecting  a  design,  there  is  a  rich 
literature  from  which  we  can  draw.  Many  of  the  commonly  used 
designs  were  originally  developed  for  agricultural,  industrial,  or 
laboratory  experiments.  Unfortunately,  most  of  these  designs 
were  developed  for  situations  involving  a  modest  number  of 
factors  and  runs  in  which  strong  assumptions  are  made  a  priori 
about  the  response  (e.g.,  low-order  polynomial)  and  error  (e.g., 
homoscedastic  normal  error).  These  situations  are  often  not  ap¬ 
plicable  to  computer  exploration.  Thus  we  desire  readily  avail¬ 
able  designs  that  allow  analysts  to  explore  how  well  a  diverse 
set  of  meta-models  captures  the  relationships  between  many  in¬ 
put  variables  of  a  simulation  and  one  or  more  output  variables. 
Toward  that  end,  we  have  presented  an  algorithm  that  generates 
NOLHs  with  good  space-filling  properties.  These  designs  allow 
an  analyst  to  examine  many  factors  by  fitting  models  with  main, 
quadratic,  and  interaction  effects  with  nearly  uncorrelated  esti¬ 
mates  of  the  regression  coefficients  for  the  linear  effects  terms. 
Having  identified  the  dominant  variables,  analysts  have  consid¬ 
erable  flexibility  in  fitting  meta-models  to  them.  Furthermore, 
we  have  cataloged  and  implemented  in  a  spreadsheet  a  set  of 
these  designs  for  from  2  to  22  factors  in  as  few  as  129  input 
combinations,  so  they  are  readily  available  (Cioppa  2002;  Lu¬ 
cas  and  Sanchez  2005;  and  http://harvest.nps.edu). 

Two  areas  related  to  this  research  are  particularly  worthy 
of  exploration.  The  first  area  concerns  design  matrices  with  a 
large  number  of  both  continuous  quantitative  and  qualitative 
variables.  Currently,  when  a  variable  contains  fewer  levels  than 
runs,  the  levels  are  used  more  than  once.  For  example,  if  one 
factor  has  two  levels,  say  high  and  low,  then  all  of  the  posi¬ 
tive  values  in  the  appropriate  column  of  our  NOLH  are  set  to 
high  and  the  negative  values  are  set  to  low.  The  appropriate  col- 
unui  is  chosen  by  searching  over  all  columns  and  selecting  the 
one  that  provides  the  best  performance  with  respect  to  our  near¬ 
orthogonality  and  space-filling  measures.  This  method  works 
reasonably  well  when  there  are  only  a  handful  of  qualitative 
factors  (see  Cioppa  and  Brown  2003);  a  thorough  examination 
over  a  variety  of  cases  is  necessary.  The  second  area  of  con¬ 
tinuing  research  concerns  sequencing,  combining,  and  crossing 
the  proposed  designs  with  full-factorial,  fractional  factorial,  or 
group  screening  designs. 
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APPENDIX:  CORRELATION  REDUCTION  METHOD 

In  brief,  Florian’s  (1992)  method  is  as  follows.  For  each  col¬ 
umn  of  a  design  matrix  X,  each  element  is  replaced  with  its 
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rank  within  the  column.  This  n  x  k  matrix  is  denoted  by  W. 
Let  C,  a  it  X  it  matrix,  be  the  rank  correlation  matrix  of  W.  If 
each  pair  of  colunms  in  W  is  uncorrelated,  then  C  is  the  k  x  ^ 
identity  matrix  I.  Only  those  realizations  of  W  for  which  C  are 
positive  definite  are  considered.  The  basic  idea  is  to  transform 
W  into  a  set  of  uncorrelated  variates.  A  Cholesky  factorization 
scheme  is  used  (because  C  is  positive  definite)  to  determine  a 
lower-triangular  matrix,  Q,  which  is  kx  k.  Then  let  D  =  Q“' 
and  C  =  QQ^  such  that  D  has  the  property  DCD^  =  I.  The 
original  W  is  then  transformed  into  a  new  matrix  by  Wb  = 
WD^.  Because  the  elements  of  the  matrix  Wb  are  not  necessar¬ 
ily  integral,  the  elements  in  each  column  are  replaced  by  their 
rank. 

Iman  and  Conover  (1980)  proved  that  the  difference  between 
corresponding  elements  in  die  correlation  matrix  of  Wb  and 
I  is  lower  than  the  analogous  difference  in  W  and  I.  Because 
the  elements  of  Wb  are  replaced  by  ranks,  this  process  can  be 
repeated  until  there  is  no  further  decrease  in  the  maximum  pair¬ 
wise  correlation  or  condition  number.  When  applying  this  pro¬ 
cedure  iteratively,  it  is  quite  common  for  the  maximum  pairwise 
correlation  to  not  change,  but  the  condition  number  to  decrease. 
Thus,  if  the  procedure  uses  only  the  maximum  pairwise  corre¬ 
lation  value,  then  this  iteration  process  may  stop  too  soon,  even 
though  a  better  design  matrix  may  exist. 

[Received  July  2003.  Revised  October  2005.] 


REFERENCES 

Akaike,  H.  (1974),  “A  New  Look  at  the  Statistical  Model  Identification,”  IEEE 
Transactions  on  Automatic  Control,  AC- 19, 716-723. 

Appleget,  J.  A.  (1995),  "The  Combat  Simulation  of  Desert  Storm  With  Applica¬ 
tions  for  Contingency  Operations,”  Naval  Research  Logistics,  42, 691-713. 

Bankes,  S.  (1993),  “Exploratory  Modeling  for  Policy  Analysis,”  Operations 
Research,  41, 435-449. 

Bettonvil.  B.  (1995),  “Factor  Screening  by  Sequential  Bifurcation,”  Communi¬ 
cations  in  Statistics:  Simulation,  24,  165-185. 

Bettonvil,  B.,  and  Kleijnen,  J.  P.  C.  (1997),  “Searching  for  Important  Factors  in 
Simulation  Models  With  Many  Factors:  Sequential  Bifurcation,”  European 
Journal  of  Operatiorml  Research,  96,  180-194. 

Chaloner,  K.,  and  Verdinelli,  I.  (19^),  “Bayesian  Experimental  Design:  A  Re¬ 
view,”  Technical  Report  599,  Carnegie  Mellon  University,  Dept,  of  Statistics. 

Cioppa,  T.  M.  (2002),  “Efficient  Nearly  Orthogonal  and  Space-Filling  Designs 
for  High-Dimensional  Complex  Models,”  unpublished  doctoral  dissertation. 
Naval  Postgraduate  School,  Monterey,  CA;  available  at  http:Mibrary.nps. 
navy.mil/uhtbin/cgisirsi/Sun+Jul-t-13+19:36:20+PDT+2003/0/S20A)2sep_ 
Cioppa_PhD.pdf.  Accessed  May  25, 2004. 

Cioppa,  T.,  and  Brown,  L.  (2003),  “Agent-Based  Models  as  an  Exploratory 
Analysis  Approach  to  Combat  Simulations,”  technical  report,  U.S.  Army 
Training  and  Doctrine  Command  Analysis  Center,  Monterey,  CA. 

Conover,  W.  J.  (1999),  Practical  Nonparametric  Statistics,  New  York:  Wiley. 

Cressie,  N.  (1993),  Statistics  for  Spatial  Data,  New  York:  Wiley. 

Defense  Modeling  and  Simulation  Office,  Joint  Staff,  available  at  www. 
dmso.mil.  Accessed  May  25, 2(X)4. 

Department  of  the  Army  (19W),  Field  Manual  100-23,  Peace  Operations, 
Washington,  E)C. 

Dorfman,  R.  (1943),  “The  Detection  of  Defective  Numbers  of  Large  Popula¬ 
tions,”  Die  Annals  of  Mathematical  Statistics,  14, 436-440. 

Fang,  K.  T.  (2005),  available  at  http:/Avww.math.hkbu.edu.hk/~ktfang/.  Ac¬ 
cessed  September  20,  2005. 

Fang,  K.  T.,  Lin,  D.  K.  J.,  Winker,  P,  and  2%ang,  Y.  (2000a),  “Uniform  Design: 
Theory  and  Application,”  Technometrics,  42, 237-248. 

Fang,  K.  T.,  Ma,  C.,  and  Winker,  P.  (2000b),  “Centered  L2 -Discrepancy  of  Ran¬ 
dom  Sampling  and  Latin  Hypercube  Design,  and  Construction  of  Uniform 
Designs,”  Mathematics  of  Computation,  71, 275-296. 


Fang,  K.  T.,  and  Wang,  Y.  (1994),  Number-Theoretic  Methods  in  Statistics,  Lon¬ 
don:  Chapman  &  Hall. 

Florian,  A.  0992),  “An  Efficient  Sampling  Scheme:  Updated  Latin  Hypercube 
Sampling,”  Probabilistic  Engineering  Mechanics,  7,  123-130. 

Golub,  G.  H.,  and  Van  Loan,  C.  F.  (1983),  Matrix  Computations,  Baltimore: 
Johns  Hopkins  University  Press. 

Hickemell,  F.  J.  (1998),  “A  Generalized  Discrepancy  and  (Quadrature  Error 
Bound,”  Mathematics  of  Computation,  67, 299-322. 

Iman,  R.  L.,  and  Conover,  W.  J.  (1980),  “Small-Sample  Sensitivity  Analysis 
Techniques  for  Computer  Models,  With  an  Application  to  Risk  Assessment,” 
Communications  in  Statistics— Theory  and  Methods,  A9, 1749-1845. 

Ipekci,  A.  I.  (2002),  “How  Agent-Bas^  Models  Can  Be  Utilized  to  Explore 
and  Exploit  Non-Linearity  and  Intangibles  Inherent  in  Guerrilla  Warfare,” 
unpublished  masters  thesis.  Naval  Postgraduate  School,  Monterey,  CA. 

Johnson,  M.,  Moore,  L.,  and  Ylvisaker,  D.  (1990),  “Minimax  and  Maximin  Dis¬ 
tance  Designs,”  Journal  of  Statistical  Planning  and  Inference,  26, 131-148. 

Kim,  H.,  and  Loh,  W.  (2003),  “Classification  Trees  With  Bivariate  Linear 
Discriminant  Node  Models,”  Computational  and  Graphical  Statistics,  12, 
512-530. 

Kleijnen,  J.  P.  C.,  Sanchez,  S.  M.,  Lucas,  T.  W.,  and  Cioppa,  T.  M.  (2{X)5), 
“A  User’s  Guide  to  the  Brave  New  World  of  Designing  Simulation  Experi¬ 
ments,”  INFORMS  Journal  on  Computing,  17, 263-289. 

Koehler,  J.  R.,  and  Owen,  A.  B.  (1996),  “Computer  Experiments,”  in  Handbook 
of  Statistics,  Vol.  13,  eds.  S.  Ghosh  and  C.  R.  Rao,  Amsterdam:  Elsevier, 
pp.  261-308. 

Lauren,  M.  K.,  and  Stephen,  R.  T.  (2001),  Map  Aware  Non-Uniform  Automata 
Version  I.O  Users  Manual,  Auckland,  New  Zealand:  New  Z^and  Defence 
Technology  Agency. 

Lin,  D.  K.  J.  (1993),  “A  New  Class  of  Supersaturated  Designs,”  Technometrics, 
35, 28-31. 

Loerch,  A.  G.,  Pudwill,  R.  A.,  and  LaBarbera,  L.  (19%),  “Army  Program  Value 
Added  Analysis  98-03  (VAA  98-03),”  Vol.  I,  Report  CAA-SR-96-1,  U.S. 
Army  Concepts  Analysis  Agency. 

Lucas,  T.  W.,  and  Sanchez,  S.  M.  (2005),  available  at  http://diana.cs.nps. 
navy.mil/SeedLab/.  Accessed  September  20, 2005. 

Lucas,  T.  W.,  Sanchez,  S.  M.,  Brown,  L.,  and  Vinyard,  W.  (2002),  “Better 
Designs  for  High-Dimensional  Explorations  of  Distillations,”  in  Maneuver 
Warfare  Science  2002,  eds.  G.  Home  and  S.  Johnson,  Quantico,  VA:  USMC 
Project  Albert,  pp.  17-45. 

McKay,  M.  D.,  Beckman,  R.  J.,  and  Conover,  W.  J.  (1979),  “A  Comparison 
of  Three  Methods  for  Selecting  Values  of  Input  Variables  in  the  Analysis  of 
Output  From  a  Computer  Code,”  Technometrics,  21, 239-245. 

Meyers,  R.  H.,  and  Montgomery,  D.  C.  (2002),  Response  Surface  Methodology: 
Process  and  Product  Optimization  Using  Designed  Experiments,  New  York: 
Wiley. 

Morris,  M.  D.,  and  Mitchell,  T.  J.  (1995),  “Exploratoiy  Designs  for  Compu¬ 
tational  Experiments,”  Journal  of  Statistical  Plarming  and  Irrference,  43, 
381-402. 

Owen,  A.  B.  (1994),  “Controlling  Correlations  in  Latin  Hypercube  Samples,” 
Journal  of  the  American  Statistical  Association,  89,  1517-1522. 

Piplani,  L.  K.,  Mercer,  J.  G.,  and  Roop,  R.  O.  (1994),  Systems  Acquisition  Man¬ 
ager's  Guide  for  the  Use  of  Models  and  Simulations,  Fort  Belvoir,  VA:  De¬ 
fense  Systems  Management  College  Press. 

Plackett,  R.  L.,  and  Burman,  J.  P.  (1946),  “Design  of  Optimal  Multifactorial 
Experiments,”  Biometrika,  23, 305-325. 

Ryan,  T.  P.  (1997),  Modem  Regression  Methods,  New  York:  Wiley. 

Saeger,  K.  J.,  and  Hinch,  J.  H.  (2001),  “Understanding  Instability  in  a  Complex 
Deterministic  Combat  Simulation,”  Military  Operations  Research,  6, 43^55. 

Sanmer,  T.  J.,  Williams,  B.  J.,  and  Notz,  W.  (2003),  The  Design  and  Analysis  of 
Computer  Experiments,  New  York:  Springer- Verlag. 

SAS  Institute  Inc.  (2004),  JMP  User's  Manual,  Cary,  NC:  Author. 

Schruben,  L.  W.  (1986),  “Simulation  Optimization  Using  Frequency  Do¬ 
main  Methods,”  in  Proceedings  of  the  1986  Winter  Simulation  Conference, 
pp.  396-369. 

Srivastava,  J.  N.  (1975),  “Design  for  Searching  Non-Negligible  Effects,”  in  A 
Survey  of  Statistical  Design  and  Linear  Modeis,  Amsterdam:  North  Holland, 
pp.  507-519. 

Ye,  K.  Q.  (2005),  available  at  http://www.ams.sunysb.edu/~kye/olh.html.  Ac¬ 
cessed  September  20, 2005. 

-  (1998a),  “Orthogonal  Colunrn  Latin  Hypercubes  and  Their  Applica¬ 
tion  in  Computer  Experiments,”  Journal  of  the  American  Statistical  Associ¬ 
ation,  93,  1430-1439. 

- (1998b),  “On  the  Structure  of  Orthogonal  Latin  Hypercubes,”  technical 

report.  University  of  Michigan,  Department  of  Statistics,  Ann  Arbor,  MI. 


TECHNOMETRICS,  FEBRUARY  2007,  VOL.  49,  NO.  1 


Reproduced  with  permission  of  the  copyright  owner.  Further  reproduction  prohibited  without  permission. 


